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ABSTRACT 

An analysis was made of the data from a single 
criterion*ref erenced test which was constructed to measure a 
hierarchy of skills in listening and reading comprehension and which 
was administered to 1,186 subjects in grades 2, 5, 8, and 11. The 
research was concerned with the applicability of a hierarchically 
ordered achievement test to the diagnosis and assessment of listening 
and reading skills. Performance objectives were derived from 11 
language comprehension skills and arranged in a hierarchical 
structure through the use of an heuristic procedural analysis. The 
design of the test was based on the theoretical constructs of, Gagne. 
Paired items were scored on a yes/no basis and a percentage of 
probable response calculated. Correlations for paired test-retest 
listening scores ranged from »86 to .99 for the 11 skills^ Paired 
test- retest reading score correlations ranged from .80 to ^96. No 
serious bias was noted in the order of item presentation. Data were 
furnished on the development, validity and reliability, 
interpretations, and uses of the test. Tables and references are 
included. (WB) 
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A study was conducted to analyze the data from a criterion- 
referenced test which was constructed to measure a hierarchy of 
skills in listening comprehension and reading comprehension. The 
design of the test was derived from a theoretical position that a 
learning hierarchy, involving the notion of positive transfer of 
learning, represents a set of specified intellectual skills which are 
both ordered from more simple to more complex and which are 
also predicted to exhibit relationships compatible with the hypothe- 
sis of transfer from lower- to higher- level skills. 

Eleven language comprehension skills were defined as perfor- 
mance objectives and arranged in a hierarchical structure through 
the use of an heuristic procedural analysis. Two parallel items 
were constructed to yield pass-fail information for each of the 11 
skills. The content of these items was intended to sample situa- 
tions relevant to each of the 11 skills, and in this sense to comprise 
a definition of these skills. Since the test was intended to measure 
th» 11 skills at four grade levels., an attempt was made to control the 
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intrinsic difficult/ of items b/ means of appropriate language 
complexity, vocabulary, and interest. 

The test was administered to 1, 186 subjects at the second, 
fifth, eighth, and eleventh grades. Results for the four grades 
studied may be summarized relative to research questions that 
were formulated to investigate the 11 cognitive skills and their 
associated performance effects. Differences were obtained in the 
degree of the consistency of measures of listening and reading, as 
shown by pass-fail measures on two items for each skill, with a 
high proportion reaching a level of consistency of . 70. Correla- 
tions of . 86, *86, .92, .98, .88, .96, .94, .66, .65, and. 99 

|. 

for each of the 11 skills, respectively, were found for paired test- 
retest listening scores. Correlations of . 80, .88, .90, .96, .98, | 

. 90, . 95, . 92, . 72, . 78, . 96 for each of the 11 skills respectively, 
were found and paired test- retest reading scores. Correlations of 
. 97, . 95, . 92, and . 95 for grades two, five, eight, and eleven, | 

respectively, were found between an original testing order and a 
scrambled testing order, indicating no serious bias arising from 
this variable. 

Conditional probabilities of correct responses to listening 
measures indicated ordered patterns of predictable relationships 
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of lower- level to higher-level skills. Scalogram anal/ses yielded 
reproducibility coefficients of . 88, . 86, . 80, and . 83 for grades 
two, five, eight, and eleven, respectively, indicating the extent to 
which test scores can be predicted to fit the model of ordered pat- 
terns of difficulty. 

In conclusion, a study was conducted by administering a test 
with four levels in oral and printed form to subjects in grades two, 
five, eight, and eleven. This research was primarily concerned 
with the investigation, according to theoretical considerations, of 
a hierarchy of Intellectual skills in listening comprehension and 
reading comprehension. Data were also furnished on the develop- 
ment, validity and reliability, interpretations, and uses of the test. 
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A study was conducted to analyze the data from a single criterion- 
referenced test which was constructed to measure a hierarchy of skills 
in listening comprehension and reading comprehension, administered to 
1, 186 subjects at the second, fifth, eighth, and eleventh grades. Based 
upon specific language skills which were identified and defined as per- 
formance objectives, the test wa'i also designed to discover the sequence 
and the predictable relationships among the comprehension skills. Un- 
doubtedly many factors contribute to the learning of such skills. However, 
this research was primarily concerned with the applicability of theoretical 
constructs proposed by Gagn6 (1965) to the measurement of the initial 
capabilities of the learner in relation to the order and dependence of 
listening comprehension skills and of reading comprehension skills. 

The design of the test was derived from a theoretical position that 
a learning hierarchy. Involving the notion of positve transfer of learning, 
represents a set of specified intellectual skills which are both ordered 
from more simple to more complex and which are also predicted to 
exhibit relationships compatible with the hypothesis of transfer from 
lower-to higher- level skills* Specifically, this research was concerned 
with the applicability of a hierarchically-ordered test of achievement 
to the diagnosis and assessment of listening skills and reading skills. 

The problem of identifying and measuring the effects of prior learning 
on performance of listening skills and reading skills is related to the 



2 



o 




the question of sequence of skills and positive transfer among the skills. 

In a recent review of studies concerned with conditions for instructional 
psycholog/, Gagn^ & P.ohwer (1969) noted that there is considerable evi- 
dence for the notion that the learning of particular classes of tasks de- 
pends in a positive transfer sense on the prior learning of other particu- 
lar classes of performance. Specifically, the authors concluded that 
learning verbal association^ typically receives much positive transfer 
from prior discrimination learning, stimulus coding, and response inte- 
gration; concept learning from prior learning on dimension discrimination; 
rule learning from prior concept learning; and problem solving from prior 
learning of relevant rules. Gagne's (1965) hypothesis that certain kinds 
of learning are necessary prerequisites, i. e. , transfer positively, to 
other kinds of learning suggests that the hierarchical nature of learning 
tasks is one of the critical conditions of learning complex performances. 

' Several studies have attempted to identify and ancdyze the hierarchi- 
cal processes involved in learning tasks (Gagn6 & Paradise, 1961, Gagn^, 
1962, Gagn4, et. al. , 1965, Gibson, 1965, Schutz, Baker, & Gerlach, 

1965, and Cox & Graham, 1966). Recent studies in the category of con- 
cept learning and rule learning are concerned with identifying kinds of 
prior learning which contribute to, i. e. , transfer positively to, the 
learning of a given class of performance. (Marchbanks & Leven, 1965, 
McNeil & Stone, 1965, Samuels St Jeffrey, 1966, Kingsley & Hall, 1967, 
and Beilin, Kagan, & Rabinowitz, 1966). 






Although there has been considerable research upon the question of 
sequence of skills and positive transfer among the skills, relatively 
little research has been conducted relating to the identification of the 
sequence and transfer among the language comprehension skills of listen* 
ing and reading. Referring in a recent review of research to serious 
questions concerning what is known about listening, Devine (1967) stated 
that studies of measurement are needed to support assumptions about the 
listening process. Davis (1967) noted that there has been a surprisingly 
small number of experimental studies in comprehension despite the long 
standing interest in reading as a thought process. Observing that stan- 
dardized reading tests often mask some of the important outcomes of 
instruction because they measure a conglomerate of skills and abilities 
at the same time, Chall (1967) pointed to the need for single component 
tests of skills, particularly of reading comprehension skills. 

The theoretical setting for the present study involved the idea that 
certain language comprehension skills might be analyzed as intellectual 
strategies which identify and define the processes in listening compre- 
hension and in reading comprehension. Specifically, intellectual skills 
are distinguishable as hieracchical classes of component skills on the. 
basis of different outcome performances. It was theorized that an analy- 
sis of the language comprehension process would identify a hierarchy 
of skills requiring rule-using behaviors and problem-solving behaviors 
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which could be measured as Intellectual skills and which are mediators 



of positive transfer among themselves, ordered from more simple to 
more complex. However, it should be emphasized that both rule-using 
behavior and also problem-solving behavior require that an individual 
possesses prerequisite capabilities, e.g. , previous!/ learned concepts. 
Procedure 

A review of the literature yielded a list of the most important listen- 
ing and reading comprehension skills. It was desired that such skills 
be operationally defined and measured as performance objectives (Gagn4, 
1964; Mager, 1962; Tyler, 1951). Skills were sought which would require 
rule-using behav’ors and problem-solving behaviors for tl^eir success- 
ful performance. Since investigators of both listening comprehension 
and reading comprehension appeared to emphasize quite similar inte- 
llectual skills as necessary in the language comprehension process, a 
single test was designed to measure each of the skills in two forms: 
oral and printed. 

Using the list of skills expressed as performance objectives, it 
appeared logical to construct a hypothetical hierarchy of skills by at- 
tempting to answer the question suggested by Gagne's (1962) procedural 
analysis for arranging skills in a hierarchical structure: "What would 
an individual have to be able to do in order to perform the final objec- 
tive, e. g. , listening comprehension or reading comprehension? The 
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research question Implied in such an anal/sis is that if an individual 



is able to perform successful!/ a skill within the hierarch/, he should 
also be able to perform successful!/ more simple skills lower in the 
hierarch/. Subject to empirical findings a hierarch/ of language com- 
prehension skills from more simple to more complex was Identified 
and defined as follows: (1) Identlf/ing the stated main idea; (2) provid- 
ing examples b/ detail; (3) reinstating a sequence of ideas; (4) inferring 
the main idea from specifics; (5) identlf/ing mood; (6) appl/ing stan- 
dards to judge persuasion; (7) predicting the sequence of thought; 

(8) inferring connotatlve word meaning; (9) identif/ing sequence am- 
biguities; (10) inferring speaker's or writer's purpose; (11) judging 
logical validlt/. 

Specificall/, if an individual is able to infer a main idea from spe- 
cifics in a passage, it appeared reasonable to assume that he would 
also be able to perform more simple skills, skills lower in the hierarch/, 
e. g. , reinstating a sequence of ideas, providing examples of dets.ils, 
and identlf/ing the stated main idea. In addition, the investigator 
h/pothesized that successful!/ judging the logical valldit/ of a passage 
probabl/ indicated that an individual possessed lower- level skills in 
the hierarch/. Empirical data were to be sought in order to investigate 
the hypothesized order of the skills. 

In order to measure an individual's abilit/ to demonstrate perfor- 
mance of each of the comprehension skills, two parallel items for each 
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of the comprehension skills, two parallel items for each of the eleven 
skills were constructed to yield pass -fail information on each of the 
skills. The content of the items was intended to sample the situations 
about which conclusions could be drawn relative to criterion performance. 
Thus, the content of the test may be regarded as an explication of the 
eleven previously defined performance objectives. To eliminate spurious 
interrelationships among skill scores for items based on the same passage, 
each of the items was based on a different passage. 

The results of this study necessarily depend on the content validity 
of the items used. No statistical manipulation of data resulting from 
use of items lacking intrinsic validity can wholly make up for their 
fundamental inadequacy (Davis, 1967). Empirical findings relative to 
item performance were expected either to confirm the appropriateness 
of the items or to suggest revisions of the items. 

Since the test was intended to measure the 11 skills at four grade 
levels --two, five, eight, and eleven- -questions relative to the skills 
which followed the passages were similar across the grade levels. 

Avoiding difficult misleads, multiple choice responses were constructed 
to measure the skills, since it was expected that empirical evidence 
would demonstrate that items had performed as intended. Furthermore, 
in an attempt to control intrinsic difficulty of content, items were con- 
structed to meet criteria of ap^. 4.opriate difficulty: (1) language complexity-- 
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coordination and subordination determined by length and number of sen- 
tences, (2) vocabulary- -selected from word lists and teachers^ opinions, 
and (3) interest- -judged by basal readers, literature texts, and teach- 
ers' opinions. 

Passages containing few, short sentences for grade two, many, short 
sentences for grade five, few, long sentences for grade eight, aind many, 
long sentences for grade eleven were constructed. A few sentences 
ranged from six to nine. Short sentences included five to twelve words 
and long sentences included eight to twenty-five words. Vocabulary and 
interest were selected and judged appropriate from texts commonly 
£>und in the schools of California and word lists, e. g. , Lorge & Thorn- 
dike (1944). These sources were reviewed by 28 teachers at the various 
grade levels on two occasions for interest and acceptability. 

The test consisted of four levels. A, B, C, and D for grades two, 
five, eight, and eleven, respectively. Subjects in each class, having 
been randomly assigned to one of two testing conditions, took the test 
in oral form- -the Listening Test- -and in printed form- -the Reading 
Test--in separate class sessions. Each of two parallel items was 
scored pass-fail and performance of each skill was scored (1) pass, 
if both items had been scored pass, and (2) fail, if one or both items 
had been scored fail, making the maximum possible score 11. 
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Results 

Since each item pair was Intended to measure a different skill, a 
measure of reliability of the scoring procedure for the test seemed to 
be unavailable from standard statistics. A test of the degree to which 
‘^inconsistency’* (+- or — f) differed from “consistency” (++ or --J in 
the measure of each skill more than would be expected by chance was 
needed. The percent consistency scores, with a high proportion reach- 
ing a consistency index of . 70, were obtained by adding the number of 
++ and -- responses and dividing by the total. Correlations of . 86, . 86, 
. 92, . 98, . 98, . 88, . 96, . 94, . 66, . 65, and . 99 for each of the 11 
skills, respectively, were found for paired test- retest listening scores. 
Correlations of . 80, .88, .90, .96, .98, .90, .95, .92, .72, .78, .96 
for each of the 11 skills respectively, were found for paired test- retest 
reading scores. 

£}vidence from the study of listening skills will be reported relating 
to the existence of a hierarchy, although data have also been collected 
and analyzed from the Reading Test. The listening skills were ranked 
by difficulty level in terms of the ordered probabilities, i. e. , averages 
of correct (pass) responses. Difficulty levels are summarized in Table 
1. For the second grade subjects, the skill with the greatest probability 
of correct responses is Skill 4; the skill with the second highest proba- 
bility of responses is Skill 5; and the skill with the lowest probability 
of correct responses is Skill 9* Inspection of the rankings across the 
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four levels of difficulty for the skills differed at grade levels. 

Although It would seem possible to obtain an average ranking of 
difficulty for each skill at the four grade levels, the per cent consistency 
scores in the measure of each skill suggested that some items were not 
adequately measuring the skill. Therefore, an attempt was made not 
only to Inspect difficulty levels for skills for each grade but also to 
further analyze the nature of the skills. 

Since an attempt had been made to arrange the skills in a hierarchi- 
cal order from more simple to more complex, it was considered im- 
portant to investigate possible bias in the effects of testing order. Cor- 
rect responses might depend on position order rather than on order of 
difficulty. Therefore, it seemed reasonable to test the additional 
hypothesis of the extent to which ordering was a function of testing. 

However, when items in the test were presented in scrambled order 
to a group of students, the diffictdty level by ranks of the skills was 
not significantly different from the original order of presentation. 

In Table 2 the results are shown for the scrambled order of the test. 

Since the data were available in terms of rank orders, Spearman*s 
rank- difference method was applied and yielded a correlation between 
the two rankings of .97, .95, .92, and .95, respectively. 

If a hierarchy of skills exists, the conditional probabilities of cor- 
rect responses should indicate the degree to which predictable relationships 




TABLE 2 



DIFFICULTY LEVEL BY RANKS 1-U FOR SKILLS PRESENTED 
IN SCRAMBLED ORDER BASED ON PROBABILITIES OF 
CORRECT RESPONSES FOR LISTENING MEASURES 



AT GRADES TWO, FIVE, EIGHT, AND ELEVEN 



SkUl 


Grade 2 

Skill Names n = 84 


Grade 5 
n = 265 


Grade 8 
n = 124 


Grade U 
n = 96 


1 


Identifying stated 












main idea 


5 


8 


8 


4 


2 


Recognizing ex» 












amples by detail 


3 


5 


1 


1 


3 


Reinstating se- 












quence of ideas 


6 


4 


10 


8 


4 


Inferring main 












idea from specifics 


1 


10 


11 


2 


5 


Identifying mood 


2 


1 


p 


3 


6 


Applying standards 












to judge persuasion 


8 


7 


9 


3 


7 


Predicting sequences 












of thought 


10 


3 


3 


7 
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Inferring conotativs 












word meaning 


4 


2 


2 


10 


9 


Identifying sequence 












inconsistencies 


11 


9 


7 


6 


10 


Inferring speaker's 












purpose 


7 


11 


4 
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11 


Judging logical 












validity 
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obtain among the skills. Table 3 presents the probability of obtaining 
a correct response to Skill X 2 given that Skill Xj is mastered. The 
probabilities shown in Table 6 represent the degree to which the attain- 
ment of one skill can be predicted from the attainment of another skill. 
For example, inspection of Table 6 shows that the probability for success 
on Skill 5, given that Skill 4 had been achieved, was . 82; the probability 
for success on Skill 2, given that Skill 4 had been achieved, was . 73; 
but the probability for success on Skill 9, given that Skill 4 had been 
achieved, was only .03. Thus, weaker relationships of predictability 
are seen between skills v/hich are higher in the hierarchy. 

Since meaningful probabilities are predicted v-hen Skill X\ precedes 
Skill X 2 in the hierarchy, the values lying above the diagc-nal will yield 
additional significant information regarding successive pairs of lower- 
and higher-level skills. Specifically, if Skill 4 was achieved, the pro- 
bability of success on Skill 5 was • 82; if Skill 5 was achieved, the 
probability of success on Skill 8 was .67. It may be noted that the values 
of conditional probability shown here are not measures of difficulty; 
and that these values may vary independly of the difficulty measure 
applicable to any given skill. The predictability of a lower-level to 
higher- level relationship might actually be . 00 (and it may be noted that 
some few such values were obtained) without influencing the difficulty 
measure per se . Thus, the values of conditional probability ranging up 
to . 82 indicate high degrees of predictability. 
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As skills get higher In the hierarch/, the probabilit/ of getting them 

# 

correct decreases. Speclficall/, the decreasing probabilit/ of correct 
responses to higher-level skills shows, in general, that as skills become 
more widel/ separated in the hierarch/, the amount of transfer among them 
diminished. Consequent!/, the patterns of conditional probabilities of 
correct responses indicate that success on an/ specified skill is predict- 
able from success on skills lower in the hierarch/. The conditional pro- 
babilities of correct responses of the skills for grades five, eight, and 
eleven are given in Tables 4, 5, and 6. 

Finall/, in order to obtain another measure of ordered relationships 
of the skills, the data were anal/zed using the Guttman Scalogram Anal/- 
sis technique. Ranking scores from highest to lowest, and ranking skills 
from most favorable to least favorable, subjects with the highest scores- - 
highest being most favor able --would have answered onl/ the most favor- 
able items; those scoring low would have answered only the least favorable 
items. The anal/sis /ields a coefficient of reproducibilit/ which indicates 
how well an individual's response pattern can be predicted knowing 
his total score. The evidence from such an anal/sis appeared to sup- 
port further the theoretical prediction that: (1) a particular skill 

might transfer positivel/ to an adjacent skill in the hierarch/ and (2) 
the successful performance of a skill wo uld insure the successful per- 
formance of subordinate skills in the hierarch/. Reproducibilit/ coef- 
ficients for each of the grades studied are presented in Table 7. Although 
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TABLE 7 



REPRODUCIBILITY COEFFICIENTS OF LISTENING 
MEASURES WITH SUBJECTS RESPONDING AT 
GRADES TWO, FIVE, EIGHT, AND ELEVEN 



Grade 


Reproducibility 

Coefficient 


2 


.88 


00 

II 




5 


• 

00 


(n = 265) 




8 


.80 


(n = 124) 




11 


.83 



(n = 96) 



19 












Guttxnan (1944) suggested that a reproducibility coefficient value of . 90 
is ein acceptable lower limit. Edwards (1957) indicated that a vlue of 
.85 shows the per cent accuracy with which responses to various state* 
ments can be reproduced from total scores. Since perfect scales exist 
only as ideal models, it is particularly useful to determine the extent 
to which success or failure for subjects with known test scores can be 
predicted to fit the model of an ordered scale. 

Discussion 

Data were obtained from a criterion* referenced test of listening 
comprehension and reading comprehension which defined and specified 
11 skills to be measured. Definitions were specified in terms of human 
performance criteria. Each item was designed to measure a particular 
class of performance. Thiis the basic foim of measurement was pass 
or fail. Two items, measuring a class of performance or skill, seemed 
sufficient, since the operation of unknown factors, varying randomly in 
their effects, would not be expected to occur twice in exactly the same 
way. 

Validity in the sense of representativeness ofwnat was measured 
(content validity) was a critical consideration in the design of each item. 
Thus, the content of the test may be regarded as an explication of the 
eleven defined performance objectives. 

Concerning the stability of test measures, differences were obtained 
as an index of consistency for measures of listening and reading, as 



-V 



(#> 

$ > 



20 



shown by pass-fail measures on two items for each skill, with a high 
proportion reaching a level of consistency of . 70. As to the reliability 
of test measures, correlations of . 86, .86, .92, .98, .98, .88, .96, 

.94, .66, .65, and .99 for each of the 11 skills, respectively, were 
found for paired test-retest listening scores. Correlations of . 80, . 88, 
.90, .96, .98, .90, .95, .92, . 72, . 78, and . 96 for each of the 11 skills 
respectively, were found for paired test-retest reading scores. In the 
matter of the possible effects of testing order, correlations of .97, 

.93, .92, and . 95 for grades two, five, eight, and eleven, respectively, 
were found between an original testing order and a scrambled testing 
order of listening measures, indicating no serious bias arising from 
this variable. 

With respect to the sequence and transfer among the skills, con- 
ditional probabilities of correct responses to listening measures indi- 
cated patterns of predictable relationships of lower- level to higher- 
level skills at all four grade levels studied. Finally, In a measure ob- 
tained to show the extent of the relationships among the skills, scalogram 
analyses yielded reproducibility coefficients of .88, .86, .80, and .83 
for grades two, five, eight, and eleven, respectively, indicating the 
extent to which test scores can be predicted to fit the model of ordered 
patterns of difficulty. 




The results were analyzed in order to study the characteristics of 
the test. Since a measure of reliability of the scoring procedure seemed 



unavailable, per cent consistency scores were computed. iVlthough a 
reasonably high proportion of pass-fall measures on two Items for 
each skill reached a consistency Index of .70, two Items can hardly be 
expected to provide a stable Indication of performance. A revised test, 
used In a current study. Included not only the construction of a third 
Item to measure each of the skills but also the modification of the scor- 
ing procedure for certain Items and the alternation of Items whose per 
cent consistency was less than . 70. However, it should be noted that 
the skills were shown to have decidedly satisfactory test- retest values. 

The theoretical assumption that the skills would exhibit relationships 
of ordering compatible with the hypothesis of transfer from lower-levels 
to higher-levels was given support from evidence from (1) an analysis 
of the conditional probabilities of correct responses, and (2) a Guttman- 
type analysis. Although the results from these analyses were highly 
encouraging, more refined analyses are needed to provide Improved 
prediction. The results highlight the need for analytic studies of language 
skills which would verify a specific causal relationship among the skills 
here Identified as ''higher** and "lower, " as well as the possibility of 
studies of transfer of learning from other subordinate skills not yet 



identified. 
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